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Abstract 

A Global Value Numbering(GVN) algorithm is considered to be com- 
plete (or precise), if it can detect all Herbrand equivalences among expres- 
sions in a program. A polynomial time algorithm for GVN is presented 
by Gulwani and Necula(2006). Here we present two problems with this 
algorithm that prevents detection of some of the Herbrand equivalences 
among program expressions. We suggest improvements that will make 
the algorithm more precise and show that the running time of the mod- 
ified algorithm will be a polynomial in the number of expressions in the 
program. 

1 Introduction 

Global Value Numbering(GVN) is a method for detecting equivalence among 
expressions in a program. A Global Value Numbering(GVN) algorithm is con- 
sidered to be complete (or precise), if it can detect all Herbrand equivalences 
among program expressions. Two expressions are said to be Herbrand equiv- 
alent (or transparent equivalent ), if they are computed by the same operator 
applied to equivalent operands [31 |B] . 

Kildall's GVN algorithm [3] is complete in detecting all Herbrand equiva- 
lences among program expressions. Gulwani and Necula [3 present a polyno- 
mial time algorithm for GVN. This uses a data structure called Strong Equiv- 
alence Dag (SED) for representing the structured partitions of Kildall [1]. We 
have observed two problems with this algorithm that prevents detection of some 
of the Herbrand equivalences (among program expressions) that Kildall detects. 
In the next section, we present two examples to demonstrate the problems. 
We suggest possible improvements that will make the algorithm more precise. 
Our analysis shows that the running time of the modified algorithm will be a 
polynomial in the number of expressions in the program. 
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2 GVN algorithm by Gulwani and NeculafS] 
2.1 Problem 1: Join algorithm 



x:=l;y := 2; 
z := X -{- y; 



z := 3; 
c := X -\- y: 
pi : 



Gl 

< d,±> 

< z,3 > 



<x,l> <y,2> 



Bi : { [d], [x, 1], [j,. 2|, [z, 3], 

[c, a: + 1 + J/, a: + 2, 1 + 2] } 



z := 4; 

d := 3: + T/; 



G2 

< c,±> 

< zA> 



< x.l > < y,2 > 



E2 : { [c], [x, 1|, [y, 2], [z, 4] 

[d, a: + y, 1 + y, 3: + 2, 1 + 2] } 



P3 : 



23 < x,l ><y.2> 

< d, ±>< c, ±>< z,±> 



E, : { [c], [d], [z], [x, 1], [y, 2] 

[x + y, 1 + y, x + 2, 1 + 2]} 



Figure 1: Join of SEDs: for program point pi, d is the SED tliat Gulwani and Necula 
[3] computes and Ei is tlie optimizing pool that Kildall[4] computes. The expression 
X -\- y and its equivalent expressions in E3 are not represented in the SED G3. 

Figure [1] shows four program nodes and a join poin10. Gi and G2 are the 
SEDs at program points pi and p2 respectively. Ei and E2 are the structured 
partitions that Kildall [3] computes at these points. G3 is the SED resulting 
after the join of the SEDs Gi and G2. The corresponding partition in Kildall 
[3] is i?3, which is the result of the meet of Ei and E2- 

As per the the definition for Herbrand equivalence of expressions given by 
Ruthing, Knoop, and Steffen 6 (see the definition at the end of section 2), the 
expression x + y in the topmost node is herbrand equivalent to the expression 
X + y in the bottommost node. Since x + y is present in i?3, using Kildall's 
algorithm, j4j we can deduce the information that whenever control reaches ps, 
an expression equivalent to x + y is already computed. But there is no way 
to deduce this information from the corresponding SED G3. Hence the GVN 
algorithm by Gulwani and Necula [3] fails in detecting the herbrand equivalence 
in this example. 

^For convenience, we use x + y instead of F('x, y) 
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2.1.1 A solution 



At a join point, the meet operation in Kildall does intersection of every pair 
of classes that have at least one common expression, whereas the Join algo- 
rithm in [3] computes intersection of only those SED nodes having at least one 
common variable (see line 3 of the Join algorithm: for each variable x € T . . . 
Intersect(Nodeci{x), Nodeczix))',)- Hence, a solution that will enable the al- 
gorithm to detect these kinds of equivalences is to modify the Join algorithm 
in such a way that, it computes the intersection of every pair of nodes in the 
two SEDs. In Figure m SED G3 shows the result of computing Join using the 
proposed method. The intersection of < c, -I- > in Gi and < d, + > in G2 results 
in the node < — , -I- > in G3, which represents x + y and its equivalent expres- 
sions. It may be noted that nodes like < — , + >, having empty set of variables 
are considered unnecessary by Gulwani and Necula [3]. But in fact these are 
necessary (as will be shown in the next section) and hence the proposed method 
will retain such nodes. 



X := l;y := 2; 
z := X + y; 



z := 3; 

c := X y. 



z := 4; 

d :— X + y; 



Gl 

< 2,3 > 



< x.l > < y,'2 > 



G2 

< z,4 > 



< d, + > 



<x,l> <y,2> 



G3 



\e:=x + y; : < z,L>< d. ±> 



< x.l > < y, 2 > 



Figure 2: Join of SEDs: pairwise intersection of nodes 



2.2 Problem 2: Removal of SED nodes 

Figure [3] shows a basic block in a program with the SEDs Gi and G2 at program 
points pi and p2 respectively. Here the expressions x+y and the two occurrences 
of a+b are equivalent and this equivalence will be detected by Kildall's algorithm 
(and also the local value numbering algorithm [2]). But it goes undetected in 
Gulwani and Necula [5] because of the following reasons. 

In section 3.1 of Gulwani and Necula fSl, it is stated that the transfer func- 
tions may yield SEDs with unnecessary nodes, and these unnecessary nodes may 
be removed (a node is considered unnecessary when all its ancestor nodes or all 
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Gl 



X 


:= l;i/:=2; 


c 


= x + y; 


pi ■■ 




X 


:=3;3/:=4; 


c 


= 5; 


P2 : 




a 


= 1;6:=2; 


d 


= a + 6; 


d 


= 5; 


d 


= 11 + 6; 



< 6, ±>< d, ±> 



G2 

< a, ±>< b, ±> 

< d,±>< x,i> 
<y,i X c, 5 > 



< x,l > <y,2> 



< -, 1 > < -,2 > 



Q2' < a, ±>< b, ±>< d, ±> 
< r, 3 >< y, 4 >< c, 5 > 



Figure 3: Removal of "unnecessary" nodes: Gi and G2 are the SEDs at points pi and 
P2 respectively. G'2 is the SED resulting after removal of "unnecessary" nodes from 
G2. 



its descendant nodes have an empty set of variables). Also, it is stated in sec- 
tion 5.1 that the data structure (SED) represents only those partition classes 
explicitly that have at least one variable. Accordingly, in G2 of Figure [3j the 
three nodes < — ,1 >, < — ,2 > and < — , + > are unnecessary and hence will 
be removed. G'2 is the SED resulting after removal of these unnecessary nodes 
from G'2. 

It can be observed that the node < — , + > in G2 represents the expression 
1 + 2 which is equivalent to x + y and a + b. With the removal of this node, 
we loose the information that the expressions x + y and a + 6 are equivalent. 
Similarly, since the variable d is redefined after the first assignment d := a + b, 
the equivalence among the two occurrences of a + 6 goes undetected. 

2.2.1 The solution 

From the above example, it is clear that the problem is due to the removal 
of some necessary nodes, which the algorithm considers as unnecessary. The 
simple solution is to retain all such nodes. In that case, for the above example, 
the SED reaching the input point of d := a + 6 will have a node representing 
the expression a + 6, indicating that an expression equivalent to it is already 
computed. 

2.3 Complexity analysis of an improved algorithm 

It is clear that the improvements suggested in the previous section will make 
the algorithm more precise. We now show that even with these improvements, 
the time complexity will still remain polynomial. For the Join algorithm, the 
suggested modification is to compute the intersection of every pair of nodes 
from the two SEDs. The number of nodes in any SED is at most the number of 
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expressions in the program. Hence a Join will invoke at most O(e^) Intersect 
calls, where e is the number of expressions in the program. Hence the running 
time of Join{Gi, G2, s') will be 0{s' x e^), a polynomial in the number of 
expressions in the program (the use of the counter variable ensures that the 
depth of recursion is 0(s')). The size of the resulting SED will be at most e 
and hence the time taken for the Join of n SEDs will be 0{n x s' x e^). 

2.4 GVN for code optimization 

In fact the GVN algorithm by Kildall was formulated with the aim of detecting 
common sub expressions. An optimization using this algorithm will subsume lo- 
cal value numbering also. The first example shown is an instance of the classical 
common sub expression elimination and the second is an instance of local value 
numbering. Hence the suggested modifications are necessary to make use of the 
GVN algorithm by Gulwani and Necula [3] in compiler code optimization. 

3 Conclusion 

To the best of our knowledge, Kildall's is the only GVN algorithm that is com- 
plete in detecting all herbrand equivalences among program expressions. It is 
already proved by Gulwani and Necula '3 that the GVN algorithm by Alpern, 
Wegman, and Zadeck [1] and that by Ruthing, Knoop and Steffen [6] are incom- 
plete. It is stated in [6] that their algorithm is restricted to the equality problem 
of variables. We observe that the same is the case with Gulwani and Necula [3] 
(and also with Nie and Cheng [7]). The suggested modifications are required 
for the general problem of detecting equivalence among program expressions. 
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